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Abstract 

In this paper, we address the general case of a coordinated secondary network willing to exploit 
communication opportunities left vacant by a licensed primary network. Since secondary users (SU) 
usually have no prior knowledge on the environment, they need to learn the availability of each channel 
through sensing techniques, which however can be prone to detection errors. We argue that cooperation 
among secondary users can enable efficient learning and coordination mechanisms in order to maximize 
the spectrum exploitation by SUs, while minimizing the impact on the primary network. To this goal, we 
provide three novel contributions in this paper. First, we formulate the spectrum selection in secondary 
networks as an instance of the Multi-Armed Bandit (MAB) problem, and we extend the analysis to the 
collaboration learning case, in which each SU learns the spectrum occupation, and shares this information 
with other SUs. We show that collaboration among SUs can mitigate the impact of sensing errors on 
system performance, and improve the converge of the learning process to the optimal solution. Second, 
we integrate the learning algorithms with two collaboration techniques based on modified versions of the 
Hungarian algorithm and of the Round Robin algorithm, that allows to greatly reduce the interference 
among SUs. Third, we derive fundamental limits to the performance of cooperative learning algorithms 
based on Upper Confidence Bound (UCB) policies in a symmetric scenario where all SU have the 
same perception of the quality of the resources. Extensive simulation results confirm the effectiveness 
of our joint learning-collaboration algorithm in protecting the operations of Primary Users (PUs), while 
maximizing the performance of SUs. 
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I. Introduction 

The general concept of Opportunistic Spectrum Access (OSA) defines two types of users: 
primary users (PUs) and secondary users (SUs). PUs access spectrum resources dedicated to the 
services provided to them, while SUs refer to a pool of users willing to exploit the spectrum 
resources unoccupied by PUs, at a particular time in a particular geographical area, referred to 
as communication opportunities [HI, 0. 

The detection of opportunities and their exploitation in secondary networks can be challenging. 
On the one hand, the secondary users can have different perceptions of a same opportunity 
depending on their observation abilities. Thus, a channel available with high probability - 
offering substantial communication opportunities- could be discarded by a SU unable to properly 
detect PUs' activity. On the other hand, several SUs can be competing for the same resources. 
Consequently, high interference can occur among them degrading the observed quality of the 
resources and the realized performance of the secondary network. 

This study addresses the spectrum allocation problem in secondary networks, through the key 
concepts of learning, collaboration and coordination. In order to implement the OSA paradigm 
in an efficient way, the SUs must be able to detect the the communications opportunities left 
vacant by incumbent users. Since usually no prior knowledge is available on the occupancy 
pattern of the channels, learning abilities are needed. Several machine learning-based techniques 
have been proposed for spectrum allocation in secondary networks. Among these, Multi-Armed 
Bandit (MAB) techniques |[3]| have gained increasing interest, due to the possibility to derive 
theoretical bounds on the performance of optimal learning algorithms. However, the impact of 
individual sensing error on the convergence of the learning algorithm is far to be completely 
explored. For this reason, in this paper we consider a collaborative network environment, where 
the secondary users can collaborate and share the information learnt on the occupancy pattern of 
the channels. Collaboration is a key element in Cognitive Radio (CR) networks 01, 0. Here, 
we investigate if and how the utilization of collaborative techniques can enhance the performance 
of the learning schemes, in order to enable secondary users to fully and quickly exploit vacant 
resources. At the same time, while collaborative learning is fundamental to mitigate the impact of 
PU interference, coordination among SUs is required to guarantee optimal sharing of spectrum 
resources and to mitigate SU interference. The coordinator entity can be either real or virtual, 
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but it should guarantee that -in the optimal configuration - a single SU is allocated per-channel. 

In this paper, we introduce and analyze a joint coordination-learning mechanism. We state 
that the suggested mechanism enables secondary networks to deal with dynamic and uncertain 
environment in spite of sensing errors. We propose three novel contributions in this paper. 
First, we formulate the spectrum allocation problem in secondary networks as a special instance 
of the Multi-Armed Bandit (MAB) problem and we propose to solve it through algorithms 
derived by Upper Confidence Bound (UCB) policies [|6]| — flHJ . Compared to previous applications 
of MAB techniques on OSA issues, we address the case of cooperative learning, i.e. SUs 
share the rewards in order to speedup the convergence of the learning algorithm to the optimal 
solution. Second, while learning PUs' occupation patterns of each spectrum band, we consider 
two general coordination algorithms whose purpose is to allocate at every iteration a unique 
SU per channel, in order to nullify the interference among SUs. The coordination algorithm 
rely on a modified Hungarian algorithm [9| and Round Robin algorithm, respectively, and 
our modifications aim at providing a fair allocation of the resources. Third, we derive some 
fundamental results on the performance of collaborative learning schemes for spectrum selection 
in secondary networks. More specifically, we demonstrate that -in a symmetric scenario where all 
SU have the same perception of the quality of the resources (yet with sensing errors) Q the UCB\ 
algorithm can efficiently learn accessing optimal solutions even without prior knowledge on the 
sensors performance. Both results, in the case of symmetric and non-symmetric environments 
are illustrated through extensive simulations. 

The rest of this paper is organized as follows. 

Section fll] discusses the works related to this paper and found in the open literature. Section 
InTl details the considered OSA framework in this paper. To deal with uncertainty, a collaborative 
learning mechanism is proposed in Section [IV] The considered coordination mechanisms are 
modeled as instances of Job Assignment problems, and are detailed in Section |V] The theoretical 
analysis of the joint learning-coordination framework is discussed in Section |VIJ Section IVIII 
describes the collaboration mechanisms implicated in this OSA context. Finally, Section IVIIII 
empirically evaluates the introduced coordination and learning mechanisms, while Section [IX] 
concludes the paper. 



'We refer to this scenario as symmetric or homogeneous scenario in the following. 
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II. Related work 

Several authors have already proposed to borrow algorithms from the machine learning com- 
munity to design strategies for SUs that can successfully exploit available resources. We focus 
this brief overview on MAB related models applied to OSA problems. 

To the best of our knowledge, the first extensive work that tackles spectrum band allocation 
under uncertainty applied to OSA, was presented in [(6]. The paper presented various models 
where a single or multiple secondary user(s) aim(s) at opportunistically exploiting available 
frequency bands. Among other models, a MAB model was suggested in the case of perfect 
sensing (i.e., the state of a sensed channel is acquired without errors). The authors of [6] suggested 
the use of the algorithm UCB\ and extended its results to the case of multi-channel selection 
by a single user. The case of multi- secondary users was also discussed. However a game theory 
based approach was suggested to tackle the problem. Such approaches lead to asymptotic Nash 
equilibrium, that is known to be difficult to compute in practice. 

Since then, several papers suggested MAB modeling to tackle OSA related problems. In 
0, d, the authors compared UCB 1 and UCB V algorithm EOj, [HD in the context of OSA 
problems, while lfl2l . lfT3l suggested to tackle multi-secondary users OSA problems modeled 
within a MAB framework. The algorithm analyzed in |fl2l . ffT3l was borrowed from [fl4l . This 
algorithm is designed for observations drawn from Bernoulli distributions and known to be 
asymptotically optimal in the case of one single user. Thus, to adapt to OSA contexts, they 
extended the results to multi-users first. Then proved that mild modification of the algorithm, 
that take into account the frequencies of the errors (i.e., false alarms and miss detection), maintain 
the order optimality of their approach. Finally, they also considered the case of decentralized 
secondary networks and proved their convergence asymptotically. 

Taking the sensing errors into account is a fundamental step to achieving realistic OSA 
models. However considering that the error frequencies are perfectly known can be limiting 
in some scenarios [Q~5l - lfT71 . In [fT8l . |fl9l , the authors showed that UCBi does not require prior 
knowledge on the sensors' performance to converge. However, the authors showed that the loss 
of performance is twofold. On the one hand, false alarm (i.e., detection of a signal while the 
band is free) leads to missing communication opportunities. On the other hand, they also lead 
to slower convergence rates to the optimal channel. Relying on these results, the authors of [|20l 
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provided complex empirical evaluations to estimate the benefit of UCB\ combined with various 
multi-user learning and coordination mechanisms (such as softmax-UCB approach for instance). 

Within a similar context, an interesting contribution can be found in lETTl . They analyzed, in the 
case of errorless sensing, the performance of UCB t algorithms in the context of several secondary 
users competing to access the primary channel. No explicit communication or collaboration is 
considered in this scenario, yet, once again, UCB algorithms are proven to be efficient to handle 
this scenario and to have an order optimal behavior. 

All hereabove mentioned paper, consider homogeneous environment (or sensing). Namely, the 
frequency errors for all users and through all channels are the same. An exception can be found 
in |[22|. Il23l . As a matter of fact, they provided a general heterogeneous framework. It is worth 
mentioning that these papers do not consider a specific OSA framework. They rather consider 
that the observed expected quality of a resource can be different. Consequently, the suggested 
model tackles multi-users in a general MAB framework rather than a specific application. The 
model, referred to as combinatorial MAB framework, is solved relying on a modified version 
of UCBi algorithms and the Hungarian algorithm. 

The work ll23l is the closest to the one provided within this paper. Unfortunately, since 
their model presents a general framework, it does not explicitly take into account the impact of 
sensing errors, nor does it show how would perform the algorithm in the case of collaborative 
homogeneous networks. Moreover, the Hungarian algorithm was only introduced as a possible 
optimization tool to solve their mathematical problem, but it was not considered form a network 
coordination perspective. The latter aspect is addressed by this paper. 

III. Network model 
In this section we detail the considered OSA framework. 

A. Primary Network 

The spectrum of interest is licensed to a primary network providing N independent channels. 
We denote byn6D= {l,--- , iV} the n th channel. Every channel n can appear, when observed, 
in one of these two possible states {idle, busy}. In the rest of the paper, we associate the numerical 
value to a busy channel and 1 to an idle channel. The temporal occupancy pattern of every 
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channel n G V is thus supposed to follow an unknown Bernoulli distribution 9 n . Moreover, the 
distributions = {9 t , 9 2 , ■ ■ ■ , 9^} are assumed to be stationary. 

In this paper we tackle the particular case where PUs are assumed to be synchronous and 
the time t = 0, 1, 2 ■ ■ • , is divided into slots. We denote by S t the channels' state at the slot 
number t: S f = {S^, • • • , Snj} G {0, 1}^. For all t G N, the numerical value S n>t is assumed 
to be an independent random realization of the stationary distributions 9 n G 6. Moreover, the 
realizations {SVi,t}teN drawn from a given distribution 9 n are assumed to be independent and 
identically distributed. The expected availability of a channel is characterized by its probability 
of being idle. Thus, we define the availability fi n of a channel n, for all t as: 

fi n =E 9n [S ntt ] = P (channel n is free) = P (S n , t = 1) 

B. Secondary Users model 

We detail in this subsection the generic characteristics of all considered SUs. 

We consider K SUs denoted by the index k G K = {1, • • • , K}. At every slot number t, the 
SU has to choose a channel to sense. To do so, the SU relies on the outcome of past trials. 
We denote by i[ k ^ the gathered information until the slot t by the k th SU. We assume that 
all SUs can only sense and access one channel per slot. Thus selecting a channel by a SU k 
can be seen as an action G A where the set of possible actions A C V = {1,2,..., TV} 
refers to the set of channels available. In this paper, all SUs collaborate through a coordination 
mechanism described in Section |V] This latter, through either a centralized or decentralized 
approach allocates at every iteration t a different channel to each SU. 

The outcome of the detection phase is denoted by the binary random variable xf^ G {0, 1}, 
where = denotes the detection of a signal by the k th SU and x[ k ' = 1 the absence of a 
signal, respectively. In the case of perfect sensing, X^ = S a (k) t for all SUs, where refers 
to the channel selected at the slot number t. However since we assumed that sensing errors can 
occur, the value of X^ k ' depends on accuracy of the detector characterized through the measure 
of two types of errors: on the one hand, detecting a PU on the channel when it is free usually 
referred to as false alarm. On the other hand, assuming the channel free when a PU is occupying 
it usually referred to as miss detection. Let us denote by and 5n , respectively the probability 
of false alarm, and the probability of miss detection characterizing the observation of a channel 
neVby the k th SU: 
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e { n k) = p(x t W = 0\S a w t = l) 
<#° = P (X t (fc) = = o) 

Finally, the outcome of the sensing process can be seen as the output of a random policy 
7i {k \e {k \S { n k \S atit ) such that: X, (fc) = n {k) {e [k \5 {k \ S aut ). The design of such policies O is 
however out of the scope of this paper. Depending on the sensing outcome X t E {0, 1}, the 
SU can choose to access the channel or not. The access policy chosen in this paper can be 

(k) 

described as: "access the channel if sensed available", i.e. if A t = 1. 

Notice that we assume the SUs' detectors to be designed such that for all k E fC and n E V, 
dn^ (respectively e„ ) is smaller or equal to a given interference level allowed by the primary 

(k) 

network (respectively, smaller or equal to a given level desired by the SU), although {en , 
8n } are not necessarily known. Moreover, we assume that a packet D t = 1 is sent for every 
transmission attempt. If interference occurs, it is detected and the transmission of the secondary 
user fails. Regardless of the channel access policy, when the channel access is granted, the SU 

(k) 

receives a numerical acknowledgment. This feedback, usually referred to as reward r t in the 
Machine Learning literature, informs the SU of the state of the transmission {succeeded, failed}. 
In our scheme, we assume a cooperative scheme is used, i.e. every secondary user shares its 
reward with the other SUs. All shared information as well as the used communication interface 
are further discussed in Section I VII I 

IV. Learning Mechanism 

A. Joined Resource Allocation-Learning Algorithm 

The learning mechanism aims at exploiting all gathered information to evaluate the most 
promising resources. Thus, the performance of a learning mechanism highly depends on the 
sampling model of the rewards (deterministic, stochastic or adversarial for instance). In the case 
of a stochastic sampling as defined in Section Unl we exploit UCBi learning mechanisms firstly 
proposed in [6], since they have proven in [0, [[8) to be efficient in OSA environments, while 
having a very low implementation complexity. 

The estimation of the performance of a resource n E V considered by UCBi indexes relies 
on the computation of the average reward provided by that resource until the iteration t to which 
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a positive bias is added. The usual form of U CB\ indexes is the following: 

B Tn (t) = W Tn (t) + A Tn{t) (1) 

where A Tn (t) is an upper confidence bias added to the sample mean WT n (t) of the resource n 
after being selected T n (t) times at the step t: 

V*" 1 r lr ^ > 

Wt «® ~ T^t) 

For that purpose, we define B^ k) as the computed index associated to a resource n observed 
Tn times by the k th decision maker until the iteration t, and A L its associated bias. 

Let B(t) refer to a K by N matrix such that component of {B(t)}{ kn j = Bj, (t), where 
k E /C, n G V and: 

k = (k-l + t)(Z)K + l 

The form of B{t) is explicitly designed to ensure fairness among SUs. As a matter of fact, the 
rows of B{t) switch at every iteration in a Round Robin way. For the rest of this paper, B{t) is 
the considered estimated weight matrix for coordination algorithms. 



Channel Selection Policy 1 (CC — UCBi(R,a)): The overall algorithm can be described as 
follows. Let R be a positive integer, R = 1 if heterogeneous network and R = K if homogeneous 
network. 

Every R rounds: computation and coordination. 

• Step 1: Compute B{t) using UCBi(a) algorithm. 

• Step 2: Compute the output of the coordination mechanism af* for all users k. 

K 

Thus, every SU is allocated R channels to access in a Round Robin fashion for the next R 
iteration. 

At every iteration during R rounds: sense and access the channels: 

• Step 3 (for R iterations): Sense the channels and Access them if sensed free. 
At the end of R rounds: collaboration-information sharing 

• Step 4: Share the sensing-access outcomes of the last R rounds. 
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As shown by the Channel Selection Policy [Q the second step relies on a coordination mecha- 
nism to perform channel allocation among the SUs. These mechanisms are usually equivalent to 
Job Assigment problems. In the following, we introduce two coordination algorithms in order to 
allow fair resource allocation among SUs: (?) the Hungarian algorithm based coordination and 
(z'z) the Round Robin based coordination. 

V. General Resource Allocation Problem 

A. Coordination and Job Assignment Problems 

We argue in this subsection that the coordination of multi- secondary users can be formulated as 
a job assignment problem. We first introduce the general notations related to the job assignment 
framework. Then we present this latter as an adequate tool to model OSA related coordination 
problems. 

Let us consider a set K, of K workers or decision makers and a set D of N jobs or resources. 
Let us denote by A the K by N weight (or cost) matrix where {\}{k,n} = An refers to a weight 
associate to the decision maker k E K, assigned to the job or resource n E V such that: 



A 



(K) X (K) 



_ A 1 N 

We assume that every decision maker can be assigned on a unique resource. Moreover, every 
resource can be handled by only one decision maker. Let a n E K, refer to the assigned decision 
maker to the resource n E V. The resource allocation problem can be formalized as follows. 
Find an optimal set of assignments such that the total weight is maximized (or equivalently, the 
total cost minimized): 



N 



,~ J»i"l(3..1 (4) 

{ai,— ,a N } — * 
n=l 



where the logic expression^! {3 a n } refers to the existence of a decision maker assigned to the 
resource n. 

indicator function: l{; 09 ; ca i ex P ression}={l if logical_expression=true ; if logical_expression=false}. 
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In the case of OSA a coordinator generally aims at canceling harmful interference among 
the SUs. To that purpose, a coordinator usually allocates different resources to different users, 
or uses advanced signal processing techniques to alleviate interference effects on the users' 
performances (e.g., Time Division Multiple Access, Frequency Division Multiple Access or 
Code Division Multiple Access to name a few): 

Definition 1 (Coordinator or Facilitator): Let /C refer to a set of decision making agents. We 
refer to as Coordinator or Facilitator any real or virtual entity that enables the different decision 
makers to jointly plan their decisions at every iteration. 

For the sake of coherence in speech, let us consider a set /C of K SUs (viz., the workers 
or decision makers) willing to exploit a set D of iV primary channels (viz., the resources). 
Moreover let {^ n }{nei>} an d K s denote, respectively, a characteristic measure that quantifies 
the quality of the primary channels (e.g., their expected availability or Signal to Noise Ratio for 
instance) and a sensing policy that characterizes the observation abilities of the k th SU. Then 
An fc ^ = / (jo (/i n ) represents the quality of a primary resource observed by the k th SU, where / (■) 
represents a (possibly implicit) functional relationship that relates primary resources' quality to 
SUs observations. Consequently, the stated problem in Equation |4] is equivalent, when allocating 
primary resources among SUs, to maximize the secondary network's observed performance. 

B. Coordination Mechanisms based on The Hungarian Algorithm 

Suggested in 1955 by H. W. Kuhn [9], the Hungarian method is a matching algorithm that 
solves the job assignment problem in polynomial time. It mainly takes as an input the matrix 
A (or its opposite, depending on whether it is a maximization or minimization approach) and 
provides as an output a binary matrix that contains a unique 1 per row and per column. This 
output indicates the resource allocation to the workers. 

Many assignment combinations can verify the stated problem in Equation |4] The Hungarian 
algorithm provides one solution among the set of optimal matching solutions. This solution 
mainly depends on the matrix A. Inverting two columns can lead to a different optimal solution 
if such solution exists. It is thus necessary to consider, for fairness reasons among SUs, a 
permutation mechanism that changes the order of the rows of the weight matrix at every new 
iteration t. To this goal, we introduce the following coordination algorithm: 



1 1 

Coordination 1 (Hungarian Algorithm based Coordination): Let t = 0, 1,2, ••■ refers to a 
discrete sampling time and let {Xn\t)} n £V refers to weights associated to the decision maker 
k E K, at the iteration t. Let X(t) refer to a K by N matrix such that {A(t)}| fcn } = Xn\t), 
k E IC, n E V and: 

k = (k - 1 + t) K + 1 

where a 6 refers to the modulo operator that returns the remainder of the division of a by b. 
Let H(t) refer to the output of the Hungarian algorithm with input X(t). 
Then the k th decision maker is assigned the resource a[ k ' verifying: 

al k) =ns.t. H(t) { - kn} = l; (5) 



C. Coordination Mechanisms based on Round Robin Algorithm 

We consider in this subsection, a particular case of the introduced job assignment problem: 
Symmetric workers. 

Definition 2 (Symmetric Behavior): Let K, refer to a set of decision making agents. These 
agents are said to have a Symmetric Behavior if their optimization criteria, their communication 
abilities as well as their decision making policies are the same. In OSA contexts, a network with 
Symmetric Behavior transceivers is thus referred to as a Symmetric Network. 

This can be formalized as particular weight matrix with the same rows for all k E K,, i.e., let 
n be a resource, n E V then: 

\/k E JC {A} {fc: „} = A n 

In this context a very simple coordination algorithm can ensure fairness among workers^: 
Coordination 2 (Circular Coordination (Round Robin)): Let t = 0, 1, 2, ■ • • refers to a dis- 
crete sampling time. We define t' = {0, 1, ■ • • ,[t/K\} as a sequel of integers updated every 
K iterations. Let {A(£)} refer to the weight matrix computed at the iteration t, and let a n (t) 
be the permutation function used at the iteration t to order the rows of the weight matrix 



3 Although the suggested form is original, the coordination algorithm is a simple Round Robin allocation scheme. It has been 
already suggested in 11131 in an OSA context without considering collaboration. 
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values. We assume that {A(t)} and a n {t) are computed every K iterations such that for all 

t G [Kt',K(f + 1) - 1], {X(t)} = A (AY) and a n (t) = a n (Kt'). 



Coordination algorithm |2] needs to know that the network is perfectly symmetric. In case this 
knowledge is unavailable or the network is non-symmetric, this coordination scheme could fail. 
Moreover, in real scenarios, the weight matrix is usually unknown and every worker can solely 
access one row of the matrix, i.e. the one related to his own perception of the environment (usually 
prone to detection errors). Consequently, OS A related problems appears as Job Assignment 
problems under uncertainty. 

Thus, we suggest in this paper to introduce collaboration and coordination based learning 
mechanisms among SUs in order to alleviate the lack of information at each SU and to converge 
to the optimal resource allocation. In order to compute an estimation of the weight matrix, we 
assume a collaboration behavior among workers to share information (discussed in Section fVTl]) . 
The shared information enables a learning mechanism to compute the estimated quality matrix as 
described in Section [IV] The performance of Coordination algorithm \T\ is empirically analyzed 
in Section IVIIIl while, in the case of symmetric networks, the performance of Coordination 
algorithm |2] is theoretically analyzed in Section |VI] 



A. Definitions of the Reward and the Expected Cumulated Regret 

Since we consider a coordinated network, it reasonable to assume that interference among SUs 
is null. Thus, relying on the previously introduced notations and assumptions, the throughput 
achieved by a SUk, k G /C, at the slot number t can be defined as: 



which is the reward considered in this particular framework, where r\ ' equals 1 only if the 
channel is free and the SU observes it as free. Consequently, the expected reward achievable by 



Then the k decision maker selects the channel n verifying: 




(6) 



VI. Theoretical analysis 
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a given secondary user SU k using a channel ay G V can be easily computed: 



E 



„(*) 



P(X« = l|^ fe) =l)P(S a ( fc) =1 



which equals to, in this case: 



(8) 



E 



.(*) 



_(*) 



(9) 



To relate with the job assignment problem described in Section |Vj the weight matrix A is 
equal, is this context, to the matrix {E 



„(*) 



}{k€K,neV}- 

We usually evaluate the performance of a user k by its expected cumulated throughput after 
t slots defined as: 



E 



(fc) 



E 



t-i 

m=0 



(10) 



Notice that in this case, follows a Bernoulli distribution. An alternative representation of the 
expected performance of the learning mechanism until the slot number t is described through the 
notion of regret (or expected regret of the SUk). The regret is defined as the gap between 
the maximum achievable performance in expectation and the expected cumulated throughput 
achieved by the implemented policy. 



t-i 



R (k) = E r 



(fc)l 



E 



m=0 a t &A t 



(fc) 



(11) 



where denotes the subspace of channels that a given SU k can access at the slot time t, 

Af ] C V. 



B. Theoretical Results: Symmetric Networks 

In Symmetric Networks, the expected quality of a channel n observed by all SUs is the same: 
Vk G /C Xn^ = X n . If the symmetry property is known to SU, all collected information on the 
probed channels at the slot number t is relevant to every SU. Thus, it can be used to improve 
their overall learning rate. As matter of fact, in this context, the SUs combine at every iteration 
all gathered rewards into one common information vector i t such that % t = {it-i, {a* , r\ }ke)c}- 
Hence, the UCB indexes computed by the SUs at every slot number t are also the same, i.e., 
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for all users k 6 K, B^ k) (t) = B Tn (t). Notice that in Symmetric Networks the optimal set of 
channels V* is composed of the K channels with the highest expected reward. Consequently a 
simple Round Robin based coordination algorithm, as described in Coordination \T\ is optimal 
(avoids harmful interference and is fair). 

In the next Theorem, we show that the regret of the k th SU in a Coordinated and Collaborative 
Symmetric Network is upper bounded by a logarithmic function of the number of iterations t. 



Theorem 1 (Upper Bound of the Regret): Let us consider K > 1 Symmetric Secondary Users 
and N > K Primary channels. The SUs are assumed to have limited observation abilities defined 
by their parameters {e n , 5 n } for every channel n. Assuming that the Secondary Network follows 
the Coordination Policy |2] to select and access the primary channels, relying on XJGB\ algorithm 

(k) 

with parameter a > 1, then every SU suffers an expected cumulated regret i?J , after t slots, 
upper bounded by a logarithmic function of the iteration t: 



R 



(k) 



Aa (A* - A n ) 



where the following notations were introduced: 

A 
A 



In (t + K — 1) + o(ln(t)) 



(12) 



(1 



K 



A n = min ne x).{A n } - A, 



Proof: This proof relies on two main results stated and proven in Lemma CD and Lemma [2] 
(C.f. Appendix). As a matter of fact, Lemma \T\ shows that the regret can be upper bounded by 
a function of the expected number of pulls of sub-optimal channels: 



R 



(fe) 



n4V* 



(A* - A re )E[T n (L!J K + K - 1)] 



K 



(13) 



Then Lemma [2] upper bounds E [T n K + K — 1)1 by a logarithmic function of number of 
iterations t: 



E 





t 












K 





<^\n(t + K-l) + o(Ht)) 



(14) 
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For the case K 



1, e n = e and 5 n = 5, we find the classic result stated in [fT8l : 



((1 - e)(niax neP {/i n } - /x n ) 2 ) 



In (t) + o(ln(t)) 



(15) 



C. Theoretical Results: Non-Symmetric Networks 

In the case of Non-Symmetric Networks, we can apply the upper bound provided in ||23l . As a 
matter of fact, our approach that decomposes, on the one hand the learning step and on the other 
hand the coordinating step, is equivalent to the algorithm referred as Learning with Linear Regret 
(LLR) in ll23l . More specifically, the authors of [|23l prove that, if the exploration parameter of 
the UCBi algorithm, i.e. the a factor, verifies this condition: a > L where L = N A K = K 
where A refers to the minimum operator, then the LLR algorithm has an order optimal behavior 
(i.e., expected cumulated regret upper bounded by a logarithmic function of the time). In our 
case, the logarithmic regret scales linearly with the value: (N A K) 3 NK as reported in [|23l . 

However fairness is not consider in [|23l . Our suggested joint coordination-learning mechanism 
alleviates this problem. It is easy to verify that the same results discussed in ll23l hold also 
when when the Coordination algorithm [T] is used for spectrum selection. Consequently, a joined 
coordination-learning mechanism in Non-Symmetric environments is order optimal. 

Although this result is fundamental to many resource allocation problems under uncertainty, 
two questions remain unanswered in [|23l : 

• With the result provided for Non-Symmetric Environment, it is obvious that the same 
mechanisms would also work for Symmetric Environments. Is it possible to provide tighter 
bounds for the regret and to use smaller value for the exploration parameter a? 

• Although the theory constrains a to values larger than K (in our case), does it mean that 
the algorithm fails for smaller values? Notice that the larger K is, the longer it takes to 
converge. 

Both questions are tackled in this paper. On the one hand, the previous subsection tackled the 
first question. We see from the results of Theorem [T| that the logarithmic function scales as 
1/K, improving tremendously the scale found in the case of heterogeneous environments. On 
the other hand, the simulations discussed in Section IVIIII suggest a piece of answer to the second 
question. 
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VII. Information Sharing: Discussion 

An efficient communication process relies on reliable information exchange. Thus, we assume 
in this paper that the communication interface used by Cognitive Radio (CR) SUs to share 
information is a Common Control Channel (CCC). The CCC is used, on the one hand, between 
a transmitter and a receiver (which can be a secondary base station or another SU), and on 
the other hand, among all transmitters and receivers for cooperation purposes. The information 
transmitted through this vessel is furthermore assumed to be received without errors. 

Thus from a Transmitter-Receiver's perspective, the purposes of the CCC are twofold: con- 
figuration adaptation and acknowledgment messages transmission. 

1 ) Configuration adaptation: To initiate a transmission, both the transmitter and the receiver 
have to agree on a particular frequency band and on a communication configuration (e.g., 
modulation). In this particular case, configuration refers, solely, to frequency band. Thus we 
assume that at every slot t the transmitter informs the receiver of the channel selection outcome 
before transmitting. 

2) Acknowledgment: At the end of every transmission attempt the receiver has to confirm 
the reception of the transmitted parquet. In case of a successful transmission, the transmitter 
receives an ACK message from the receiver. Otherwise, in case of PU interference, it receives a 
NACK message. 

3) Information sharing: As mentioned in Section IIII-Bl at the end of every slot t, and 
for cooperation purposes, a communication period is dedicated to share computed rewards 
information among SUs. As a consequence, a given SU can coordinate its behavior according 
to other SUs. Moreover, each SU can learn faster the spectrum occupation by relying on the 
outcomes of the other SUs' attempts, gathered on bands it did not address. 

VIII. Empirical Evaluation: Simulation Results 

In this section, we describe and show the simulation results aimed at illustrating the herein 
suggested resource selection mechanisms. We first describe the general experimental protocol 

4 Whether to use or not CCCs for cognitive radio networks is still a matter of debate in the CR community. This debate is 
however out of the scope of this paper. Notice that the conclusions of this study would still apply if we assumed any other kind 
of reliable information exchange interface among secondary users. 
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and the considered scenarios in Subsection IVIII-Al Subsection IVIII-Bl presents and discusses the 
simulation results pertaining to the regret analysis. Subsection IVIII-Cl show the results pertaining 
to the secondary network performance analysis. 



A. Scenario and experimental protocol for the regret analysis 

We consider 3 secondary users willing to exploit 10 primary channels with unknown expected 
occupancy patterns ji = {A*n}{i,-,io}- For the sake of generality, we do not provide explicit 
numerical values to PUs' channel occupancy and to the probability of false alarms. The impact 
of sensing errors has been analyzed and illustrated in a previous work rfi~8l . 

We denote by Xn the expected reward of a resource n observed by a user k. We, however 
consider that the occupation state n observed by a user k at the slot t follows a Bernoulli 
distribution with parameter A„ . Thus, the application to OS A related scenarios is straightforward 
as: = ^1 — /in in this context. 

For illustration purposes we tackle two scenarios. On the one hand we consider 3 symmetric 
users. While on the other hand, we consider that the 3 secondary users are divided into 2 sets: 
two symmetric users sharing the spectrum with a last secondary user whose optimal channel do 
not belong to the set of optimal channels of the other secondary users, such that: 

Scenario 1 (Symmetric network): We consider a quality matrix A defined as: 

0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

Scenario 2 (Non-symmetric network): We consider a quality matrix A defined as: 

0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
0.1 0.1 0.2 0.3 0.4 0.7 0.9 0.7 0.7 0.6 
These scenarios aim at illustrating both Hungarian and Round Robin based coordination algo- 
rithms. We expect the channel selection algorithm, relying on both learning and coordinations 
mechanisms to be able to converge to the set of optimal channels in Scenario [T] However, in 
Scenario |2] only the Hungarian algorithm based coordinator is illustrated as a Round Robin 
approach would be inefficient. 



A 



A 
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During all experiments, the learning parameter a is selected such that a = 1.1 (to respect the 
conditions of Theorem [T). Notice that these simulations were conducted so as their respective 
results and conclusion could be generalized to more complex scenarios. 

Finally, the presented results are averaged over 30 experiments with a final horizon equal to 
1 000 000 slots to obtain reliable results. 



B. Simulation results: Regret Analysis 



The averaged regret -over the number of SUs- of four algorithms are illustrated in Figures 1(a) 



and | l(b)| in the context of Scenario [T] Figure |l(a)| shows the regrets of the Hungarian algorithm, 
respectively, with or without common information vector (i.e. with or without collaborative 
learning), while Figure |l(b)| correspond to Round Robin based coordination algorithms with 
common information vector. In this latter case, one algorithm updates its information vector 
every 3 iterations (i.e., every K iterations as considered in Theorem [T), while the second one 
updates its information vector every slot. 

On the one hand, Figure |l(b)| illustrates Theorem [T] As a matter of fact, we observe that the 
regrets of Round Robin based algorithms are similar and have indeed a logarithmic like behavior 
as a function of the slot number. This behavior is observed for all four simulated algorithms. 
Secondly, as expected, the Hungarian based coordinator with collaborative learning performs as 
well as Round Robin based coordinators. 



On the other hand, Figure |l(a)| shows the impact of coordination with individual learning 
(the shared information is only used for coordination purpose). In this case the regret grows, 
as expected, larger by a factor approximatively equal to K. In this case where the users are 
symmetric but unaware of that fact, they do not exploit other users' information to increase their 
respective learning rate. The collected information from their neighbors is solely used to compute 
the quality matrix A to enable coordination. Thus, we observe in Figure [1(a)] that the Hungarian 
algorithm is still able to handle it however, as already noticed, with a loss of performance. 

In the case of Scenario [2l Round Robin based coordination algorithms are in general not 
efficient. Consequently, we do not illustrate them in this context. Figure |2] shows the proportion 
of time the Hungarian algorithm based coordinator allocates the different secondary users to their 
respective optimal sets. We can observe that the curves increase rather quickly which indicates 
that the algorithm allocates the SUs to their respective optimal sets most of the time after a first 
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(a) (b) 

Fig. 1. Collaboration, Learning and Coordination in the case of Symmetric Networks: averaged regret. The simulation results 
show that both Hungarian algorithm and Round Robin based coordinators can efficiently learn to allocate the resources among the 
SUs. All curves are computed with a = 1.1. Left Figure shows the impact of collaboration on the learning process in symmetric 
networks. Right curves compares learning mechanisms with both Hungarian coordination or Round Robbin coordination. We 
notice that their performance is quite similar. 



learning phase. Theoretical analysis as well as testbed-based experiments are currently under 
investigation to confirm these results. 

C. Simulation results: Network Performance Analysis 

In this subsection, we evaluate the performance of joint collaboration-cooperative learning 
scheme from the point of view of secondary network performance. To this aim, we model a 
primary network with N=10 channels, and a secondary network composed of K=4 transmitter 
nodes. The temporal occupation pattern of the N channels is defined by this vector 9 of Bernoulli 
distributions: {0.1, 0.1, 0,2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}. All the SU have a fixed probability of 
sensing miss-detection and sensing false-alarm, i.e. a Symmetric Network scenario is considered. 
Unless specified otherwise, we set 6n -0.2, for all SU k and channel n. At each slot, each SU 
k decides a channel to sense, and transmits a packet of 1000 bytes if the channel is found idle. 
No transmission attempt is performed in case the channel is sensed occupied by a PU. Both 
interferences among SUs and between a SU and a PU are taken into account in the model. If 
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Fig. 2. Percentage of time the Hungarian algorithm based coordinator allocates the different secondary users to their respective 
optimal sets. The exploration parameter a is chosen equal to 1.1. This value is smaller than the minimum value suggested by 
the theory. We observe however that the algorithm remains consistent. 



no interference occurs during the SU transmission, then an ACK message is sent back to the 
SU transmitter. Otherwise, the data packet is discarded by the SU receiver node. Thus, at each 
slot t, each SU k can experience a local throughput TP k (t) equal to or 1000 bytes, based on 
interference and sensing conditions. The average network throughput NTP(t) is defined as the 
average amount of byte successfully transmitted in the secondary network at each slot t, i.e.: 
NTP(t)=E[ZtiTP k (t)}. 

We consider four different configurations of learning, cooperation and coordination schemes in 
our analysis: 

• CI (Random, No Learning): no learning is employed by SUs. At each slot, each SU chooses 
randomly the channel to sense among the available N channels. 

• C2 (Individual Learning, No Coordination): each SU employs the UCBi algorithm to learn 
the temporal channel usage. No coordination and collaboration mechanisms are used. At 
each slot t, each SU k chooses randomly based on the local UCBi-'mdex associated to each 
channel. More specifically, the probability to select channel n is computed proportional to 
1 — -8^(1) t ■ The probabilities are normalized so that there value is between and 1, and 
their sum equals one. 
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• C3 {Cooperative Learning, No Coordination): as before, each SU employs the UCBi 
algorithm to learn the temporal channel usage, and shares the rewards received at each 
slot t. However, no collaboration mechanism is used. The channel selection is performed 
as the previous case. 

• C4 {Cooperative Learning, Cooperation): the complete Channel Selection Policy 1 described 
in Section [IV] is evaluated. The Round Robin algorithm is considered for channel access 
coordination. 

Figure [3(a)1 shows the network throughput as a function of the time slot t, averaged over 1000 
simulation runs. As expected, the SI scheme experiences the lowest throughput, since it does not 
take into account any mechanism to prevent SU and PU interference. On the other hand, both 
S2 and S3 schemes employ learning mechanisms to derive the PU occupation patterns of each 
channel, and thus are able to mitigate the interference caused by incumbent PU transmissions. 



Moreover, Figure |3(a)| shows that the S3 scheme slightly enhances the S2 scheme since the 
usage of collaborative mechanism with reward sharing reduces the occurrence of wrong channel 
selection events due to local sensing errors. However, both S2 and S3 schemes do not include 
coordination mechanisms, and thus suffer of packet losses caused by SU interference i.e. by the 
fact that multiple SU transmitters are allocated on the same channel. The S4 scheme nullifies 
the harmful interference among SUs through Round Robin coordination, and thus provides 
the highest performance. Figure |3(b)| shows the average network throughput as a function 
of the number of SU transmitters in the network. Again, Figure |3(b)| shows that the joint 
cooperative learning and cooperative scheme provides the highest performance over all the 
scenarios considered. 



IX. CONCLUSION 

In this paper, we have addressed the problem of Opportunistic Spectrum Access (OSA) in 
coordinated secondary networks. We have formulated the problem as a cooperative learning task 
where SUs can share their information about spectrum availability. We have analyzed the case 
of symmetric secondary networks, and we have provided some fundamental results on the per- 
formance of cooperative learning schemes. Moreover, we have proposed a general coordination 
mechanism based on the Hungarian algorithm to address the general case (i.e. both symmetric 
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SI - Random, No Learning 
S2 - Individual Learning, No Coordination 
S3 - Collaborative Learning, No Coordination 
S4 - Collaborative Learning, Coordination 
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Fig. 3. The network throughput over simulation time in a scenario with A' =4 is shown in Figure [3(a)! The network throughput 
as a function of the number of SUs (i.e. K) is shown in Figure [3(b)] 



and asymmetric networks). We are planning to validate our approach on cooperative learning 
schemes through further theoretical analysis and Cognitive Radio testbed-based implementations. 
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Appendix: Proofs 

We introduce and prove in this section technical results used to justify the important results 
stated in this paper. 
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Lemma 1 (Regret, general upper bound): Let us consider K > 1 Symmetric Secondary Users 
and N > K Primary channels. The SUs are assumed to have limited observation abilities defined 
by their parameters {e n , S n } for every channel n. Assuming that the Secondary Network follows 
the Coordination Policy |2] to select and access the primary channels, relying on XJCB\ algorithm 



with parameter a > 1, then every SU suffers, after t slots, an expected cumulated regret R 



(fc) 



upper bounded such that: 



itf < £ 



(y-X n )E[T n ([±\+K-l)] 



K 



(16) 



where E [T n (£)] refers to the expected number of pulls of a given channel n (by all SUs), and 
where the following notations were introduced: 



X* 



(1 - e n ) n r 



K 



Proof: We can upper bound the regret of a user k as defined in Equation \TT\ by the regret 



that he suffers at the end of the considered round of K plays, i.e., 

[t/K\ K- l 



>(fc) 



E 



(k) 

Km+p 



where the sum Ylp=G ( A* — E 



,(fc) 



Km+p 

of K plays indexed by the round number m, can also be written as: 

K-l K-l 



m=0 p=0 

which refers to the cumulated loss during the round 



p=0 



E 



(k) 

Km+p 



(fc) 

Km+p 



n£V* p=0 

which justifies the second inequality. Notice that this sum is positive if and only if at least one 
sub-optimal channel, n V*, is selected among the best K channels to be played during the 
round m. 

Thus we can further upper bound the regret as follows: 

\t/K\ 

m=0 n£V* 

A (fc) 

where A Km refers to the K channels with the highest indexes evaluated at the round number 
m evaluated by the k th SU. An inversion of the two sum leads to the following expression 
inequality: 



\t/K\ 



R? < £ (A* - A„) P (n e ^ 



(17) 



m=0 
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Finally, we notice that the three following equalities are verified: 



' W*/*J P (n G A {k) \ - E |vL*/^J 1 r , H , 

Z^ m =0 r y l ^Jfm j Z^m=0 IneA^l 

< l{ ne ^)} = rt k) ( [t/tfj if + K - 1) 



k T - ( W K J if + A - 1) = T n , ( [t/ifj K + K- l)/K 



where the second equality can be read as: the number of time a channel n is selected by a 
user fc, until the slot number [t / K \ K + K — 1, is equal to the number of rounds the event 
|n G v4^| is verified. The third equality on the other hand, reminds us that in the context of 
symmetric users, all SUs share the same information vector and obtain the same index values. 
Consequently, if a channel n is selected at a given round, it is played exactly once by every SU. 
In other words, the channel is selected K times during a round of K plays. 

Thus substituting and combining the three previous equalities with Equation [FT] leads to the 
stated result and ends this proof. ■ 

Lemma 2: Let us consider K > 1 Symmetric Secondary Users and N > K Primary channels. 
The SUs are assumed to have limited observation abilities defined by their parameters {e n , S n } 
for every channel n. Assuming that the Secondary Network follows the Coordination Policy [2] 
to select and access the primary channels, relying on UCBi algorithm with parameter a > 1, 
then every suboptimal channel n, after t slots, has an expected number of pulls upper bounded 
by a logarithmic function of the number of iterations that: 



Proof: We start by a first coarse upper bound verified for all u n G N: since every channel 
is to be sensed at least K times, we can write: 




(18) 




(19) 




We can write: 




(20) 
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In this last inequality, the joint event {T n (Km) > u n + 1} is left implicit to ease the notations. 
This will be the case in the next assertion. Moreover notice that: Vn G P*, K < T n (Km) < m. 
Since for all r G M + we have the following event inclusion: 

{B Tn (Km) > mm nev » {B Tn (Km)}} 
C {B Tn {Km) > t} U {mm neV « {B Tn (Km)} < r} 



(21) 



We can write: 



E[T n ([±\K + K-l)} <K + u r 



+KE^t n+K V(B Tn (Km)>T) 
+KEltL +K W(rmn neV * {B Tn (Km)} < r) 
For the rest of the proof we assume that: 



(22) 



it, 



r = min neV * {\ n } 



(23) 



then we prove that: 



■ L*J 



ELIL+K P (Br n (Km) >t)=o (In (*)) 



(24) 



ELIL + K^(^nev* {B Tn (Km)} < r) = o(ln(t)) 
First, we start by the following term: P (Bx n (Km) > r). Notice that if the event (including 
its implicit event) {B Tn (Km) > T;T n (Km) > u n + 1} is verified then there exists an integer 
s : u n + 1 < s < m such that the real value verifies B s (Km) > r. Consequently, we can write: 



F(B Tn (Km)>r;T n >u n + l)< ^ P (B s (Km) > r) (25) 

s=u n +l 

Considering an index value computed as detailed in Equations \T\ and [2J we can write: 

P (B s (Km) > t) = P (W s (Km) > r - A s (Km)) 

= P (W s (Km) -\ n >r-X n - A s (Km)) (26) 



Since s > u n + 1, then: 



r - X r 



A s (Km) > A n - 



'aki(Km) A n 
u n ~ 2 
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Consequently, we can write: 

P (B s (Km) > r) < P \W s (Km) - A n > ^ J (27) 

< e~ 2 ^ s (28) 

< -2aln(mK+K-l) < 1 n q\ 

~ ~ (mK) 2a K } 

where the second inequality is a concentration inequality known as Hoeffding's inequality [|25l . 
The third inequality is once again due to the inequality s > u n + 1. Finally assuming that a > 1, 

\_k\ \_k\ m 

E V{B Tn {Km)>r)< E ( 30 ) 

oo j 
- E ( m )2«-l(^)2a ^ 31 ) 

= C n , Q = (ln(t)) (32) 
where Cfc, Q exist for a > 1, is finite and is defined as the limit of Reimann's serie: J2m={u n +K) (mp 

We deal know with the following term: P (mm n€V * {Br„(Km)} < r) (including the implicit 
event). In order to avoid confusing optimal channels and sub-optimal channels, for the rest of this 
proof, we denote by n* a channel that belongs to the optimal set V*. As for the previous proof, 
and since for any {T n »}min n « SI) *, K < T n » < m, if the event {mm n » eX >» {B Tn , (Km)} < r; K < 
T n * < m} is verified then there exists a channel n* E V* and an integer s n * : K < s n * < m such 
that the real value verifies B Sn , (Km) < r. To ease notations we introduce P n * the considered 
event: 

P n , = P ( min {B Tn , (Km)} < r; T n . > K 
Consequently we can write: 

m 

Pn*< E E V(B Sn ,(Km)<r) (33) 

n*eX>* s n =K+l 

Notice that for any n* G T>*: 



min {A n »} - A„» < 
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Consequently, as for the previous proof, relying on Hoeffding's inequality, we can write: 

m 

IV < E E V (W Sn *(m) - \ n * < -A n *) (34) 

n*£B« s n *=K+l 
in 

< Ke~ 2A l* s -* (35) 

s n *=K+l 
III 

< Ke- 2alniKm) (36) 

s n *=K+l 

< (37) 

" (Km) 2 *" 1 K ; 



Finally, we can write: 



E F -< E (38) 



m=u n +l 



^ (Km) 20 - 1 

co 1 

< (39) 

" ^ (Km) 2 *- 1 V ' 

m=u n +l v ' 

= C n ., a = o(ln(i)) (40) 

where C n * jCt exist for a > 1, is finite and is defined as the limit of Reimann's serie: J2m= Un +i (Km) 2 "- 1 1 

Finally, since: [-^J K + K — 1 < t + K — 1, combining Inequalities [2Ql [32] and |40l we can 
finally write: 

E [T n ( L^J K + K - 1)] < || In (t + K - 1) + o (ln(t)) (41) 
Which ends the proof. ■ 



